-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-7237 Clean function in several RDD methods #5959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that mappartitionsWithIndex cleans, and thus anything in that is cleaned. maybe it's an incorrect assumption. cc @andrewor14?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Reynold is correct.
I can update the PR for collect() and undo the change for other methods touched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think collect might've been cleaned in DAGScheduler's runJob. Double check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that @andrewor14 added some tests for closure cleaning as part of his recent ClosureCleaner patch; we might check whether that test suite covers these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test suite does not currently cover these methods because they are deprecated, but maybe we should just add them. (@JoshRosen is referring to ClosureCleanerSuite)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin @ted-yu actually even though mapPartitionsWithIndex does clean already, it cleans the whole closure but not the ones used in the closure. In this case, I believe it's actually necessary to clean f here since we won't actually clean it from mapPartitionsWithIndex. For the same reason I believe we also need to clean constructA since it's a closure provided by the user.
|
Hey @ted-yu would you mind adding |
|
ok to test |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32138 has started for PR 5959 at commit |
|
@ted-yu Also it would be good to add test cases for the particular methods you are testing in |
|
Test build #32138 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32150 has started for PR 5959 at commit |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32151 has started for PR 5959 at commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that the tests will fail because these closures are expected to have return statements in them. Can you model these closures after the other ones in this test?
|
Hi @ted-yu I believe this also needs to cover |
|
Test build #32151 has finished for PR 5959 at commit
|
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Test build #32150 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test build #32236 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
@rxin @andrewor14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this commented out?
|
From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32195/consoleFull : [info] - filterWith *** FAILED *** (13 milliseconds) |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32275 has started for PR 5959 at commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ted-yu I investigated this a little and this won't work because you're bringing in the whole SparkContext into the closure. Instead we need to clean the closures outside of mapPartitionsWithIndex as you have done in other methods. There is nothing special about filterWith.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once you have done that please uncomment the tests
|
Should have looked closer :-) |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #32279 has started for PR 5959 at commit |
|
Test build #32275 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Great, merging this into master and 1.4. Thanks @ted-yu. |
Author: tedyu <[email protected]> Closes #5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods (cherry picked from commit 54e6fa0) Signed-off-by: Andrew Or <[email protected]>
|
Test build #32279 has finished for PR 5959 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
Author: tedyu <[email protected]> Closes apache#5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods
Author: tedyu <[email protected]> Closes apache#5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods
Author: tedyu <[email protected]> Closes apache#5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods
No description provided.